Cloning multiple control sequences into chromosomes or into artificial centromeres

ABSTRACT

Artificially synthesizing 29 homozygous cystic fibrosis core panel controls demonstrates placing multiple homozygous mutant sequences on the same single control DNA sequence to streamline quality control by minimizing extra control assays, time, and costly formatted test materials and testing all controls during every test. Any rare or unavailable reported DNA sequence can be PCR amplified using primer pairs synthesized with the designated mutation or variant sequence with paired adjacent upstream and downstream primers to amplify target sequences in total genomic DNA.

This application claims priority to U.S. Ser. No. 11/074,265 filed on Mar. 7, 2005, which is a continuation of Ser. No. 10/236,168 filed on Sep. 4, 2002, which claims benefit of This application claims priority to U.S. Ser. No. 60/625,863, entitled CONSTRUCTING CELLS WITH HOMOZYGOUS MUTATIONS, filed Nov. 8, 2004, which is incorporated herein by reference.

SUMMARY

This application teaches that cell lines can be constructed with selected deleted portions of genes to be tested that result in heterozygous and/or homozygous, or hemizygous gene deletions. These cell lines can be used directly as controls or modified further by inserting mutant gene sequences so the total DNA isolated from the cell line(s) control for unknown or complex homologous genomic DNA sequences. This approach optimizes gene deletion construction when the whole gene length is large and/or targeted analysis spans only a portion of the tested gene. This application teaches cloning multiple control sequences into vectors that can be incorporated into artificial chromosome centromeres or into chromosomal DNA targets in normal or abnormal chromosomes in the parent cell line and selecting for the cells with incorporated sequences. Constructed cell lines grow and divide with the predetermined ratio of multiple control DNA sequences to total genomic DNA sequences. Artificial mixtures of independently derived DNAs can also be constituted in predetermined ratios to achieve the same ends. DNA from these cell lines is grown, extracted, quantified, and used directly. One application is to test the homozygous mutant and heterozygous controls from the beginning of the assay along with the unknown samples in two separate control tubes.

Selected partial or complete gene regions may be deleted by recombination. One is the O-type insertion event and another is the Q-type insertion event. The chromosomal DNA (t) and inert construct (v) may be any length and thus these labels may be reversed in the Ω-type insertion event. The replacement fragment may be any group 1 or more gene fragments spliced in any order with 1 or more mutations on each fragment so long as the end fragments are targeted to the flanking sequences of the gene that are to be replaced.

DEFINITIONS

Genetic locus: may be intragenic or extragenic, exonic or intronic, upstream or downstream promoter, centromeric or telomeric, nuclear or mitochondrial or viral.

A homologous locus may be a pseudogene or any other sequence homologous (similar) to the gene or polymorphic locus being tested that could interfere with analysis by acting as a binding site for one or a pair of PCR primers, bind substantially to a hybridization probe following Southern or Northern blot analysis, bind to a pair of PCR primers so that the intervening sequence is amplified, or be sufficiently homologous to compete for binding with the unknown detected sequence.

SUMMARY OF RELATED ART

Currently available controls for genetic and infectious disease testing are rare resources that might be obtained in small quantities from very cooperative laboratories already doing the test. These materials are typically in short supply because these samples represent the remaining nucleic acid samples following disease testing of a small tissue or body fluid aliquot obtained and submitted to the laboratory from a patient being tested for a suspected genetic disease gene or for an infectious disease organism. Furthermore, laboratories often require signed consent forms or other proscribed protocols approved by Internal Review Boards (Human Experimentation Committees) in order to use these excess materials during subsequent testing of patient samples. Thus after validating any test, the laboratory categorizes and stores these valuable control stocks for ready access during subsequent testing. When a laboratory exhausts its original control supply, attempts are made to replace it with nucleic acid sequences from another sample found to have the desired sequence(s) since introducing the test. Another approach among cooperating laboratories has been to submit patient lymphocyte samples with abnormal nucleic acid sequences to Coriell Institute or the American Type Culture Collection for maintenance and redistribution for a fee to qualified laboratories doing research or testing. The repository transforms (immortalizes) the submitted patient's lymphocytes with Epstein-Barr virus so that the cell line can be propagated nearly indefinitely. However, immortalized cell lines become aneuploid and then accumulate additional chromosome rearrangements as the cells are propagated. At the same time, selected infectious disease agents can be purchased through factilites that routinely handle and distribute biohazardous material. Typically, federal contracts, grants, and individual user fees maintain these resource facilities.

Although these facilities complete a very important function, the characteristics of the controls provided in the cell lines may not be optimal for each subsequent assay following distribution. For instance, one excellent control for an autosomal recessive genetic disease assay that simultaneously tests the normal sequence and the abnormal sequence at 25 different gene sites is one or a few cloned DNA fragments of sufficient length spliced in any order with each mutation to be tested with a minimal number of normal sequences at any of the mutation sites tested in any cloned fragment (See Lebo et al., U.S. patent Ser. No. 10/236,168, which is incorporated herein by reference). A second control could be heterozygous sequences with equal numbers of normal and mutant sites to determine whether the number of each are quantified correctly in any unknown sample. A third control could be a normal control DNA with no mutations tested in a separate reaction mixture. In contrast, only using a transformed cell line that is from a heterozygous R116H carrier would represent a suboptimal test control because this could not be used to determine whether the assay can distinguish between a homozygous genotype with two R116H alleles and a heterozygous genotype with one R116H and one normal allele. In fact, during the development of cystic fibrosis controls to contribute toward enabling this patent, at least one mutation site tested by the multiple mutation test strip provided by one large manufacturer could not distinguish between one abnormal and one normal heterozygous genotype and a genotype with two homozygous mutation sites. Therefore the sequences at this location had to be redesigned in the multiplex test assay to correct this previously undetected deficiency. Furthermore, not all the mutations were available for each desired American College of Medical Genetics (ACMG) cystic fibrosis panel mutation tested. When the ACMG panel was first adopted, about seven of 25 mutations were available. Even after two years of collecting and transforming cell lines with the ACMG cystic fibrosis controls at Coriell Institute, four mutations were still unavailable to laboratories purchasing patient cell lines as controls for commercial cystic fibrosis test kits. At the same time, the cells purchased had very few homozygous mutant controls so that these heterozygous controls could not determine whether the controlled test could distinguish between a homozygous and a heterozygous patient sample.

Even if 25 cell lines with homozygous cystic fibrosis mutations at all the gene sites tested had been available when the ACMG panel was first tested, together these would not have provided an optimal control because each cell line would have homozygous normal sequences at all but one or two of the other gene loci tested. Therefore only one homozygous mutant site could be tested with each available control cell line to determine whether the assay was sufficiently specific to distinguish a homozygous or a heterozygous patient sample. At the same time, only the F508 cell line was homozygous for any mutation to determine whether the assay could distinguish a homozygote from a heterozygote at the mutation site. Finally, only 1-3 mutation control cell lines could have been tested per cystic fibrosis assay because as DNAs from multiple cells lines are mixed, the number of normal allelic sites would continue to increase relative to the mutant allelic sites so that the signal from the normal sequences would have become ever more intense than the abnormal test signal. Because the cost of an individual commercially produced multiplex assay represents a major proportion of the reimbursed test fee, testing each available cell line to constitute an entire 25 mutation panel during every assay becomes prohibitively expensive in terms of reagents and assay time. Therefore many CAP (College of American Pathology) certified laboratories have been rotating designated subsets of known mutations when completing individual assays. This protocol would not detect test failure for the mutations without suitable test controls in any particular assay.

In contrast, laboratories have synthesized 25 homozygous controls on 17 DNA fragments cloned into 9 vector sites. Typically, weekly PCR assays amplified all these 25 mutations in three different Innogenetics PCR amplification reactions, and tested the amplified products on two Innogenetics test strips (Lebo et al., Mss. Attached). This control is optimal for the Innogenetics platform because it confirms that homozygous mutations give homozygous test results during each assay, and do not hybridize nonspecifically to normal DNA sequence locations to give heterozygous results. On other occasions, as many as 100 heterozygous mutation controls were synthesized to optimally control each mass spec assay which detects and quantifies each par of mutations and the corresponding normal sequences independently or in preselected multiplex reactions. For this reason a heterozygous control is optimal for mass spec because the peak locations reflecting the molecular time-of-flight for both mutation and normal sequence is measured during each independent mass spec analysis.

Artificial chromosomes have been constructed by using centromeric sequences of 171 bp repeats of monomeric units usually attached head-to-tail in long arrays [Willard, H. F. (1991) Evolution of Alpha Satellite. Current Opinion in Genetics and Development 1:509-514.] One means to construct artificial chromosomes is by adding cloned repeated centromeric sequences to cells [Willard H F. (1998) Centromeres: The missing link in the development of human artificial chromosomes. Curr Opin Genet Dev 8(2):219-225; Grimes B R, Rhoades A A, Willard H F, 2002]. Alpha-satellite DNA and vector composition influence rates of human artificial chromosome formation [Mol Ther 5(6):798-805; Choo, K. H. A. (1997) The Centromere. Oxford University Press]. These published reports are incorporated in entirety by reference). Selected control sequences can be added to artificial chromosomes including genes intended to correct genetic defects. This approach typically includes selectable gene markers without which the host cell cannot survive when grown in selectable conditions because selectable genes provide resistance to the selecting agent.

At the same time, mouse models of human genetic disease have been constructed from normal mouse embryo stem cells using Hit and Run vectors. Circular vectors with unique genomic target DNA typically including a mutation along with selectable genes like G18/neomycin or hybromycin can be inserted into a host cell genome by homologous recombination. The cells with the incorporated recombinant selectable genes are then grown in medium that kills the cells without the selectable gene. The remaining survivors carry the inserted vector, typically at the targeted site. Then the antibiotic resistant cells can be transfected with plasmid containing Cre which enhances recombinase-mediated site-specific recombination to remove up to 200 kb of targeted DNA. This method provides a means to introduce homozygous mutations or large partial gene deletions into (1) normal cells from any normal individual in the species to construct heterozygous cell lines, and (2) cell lines from distantly related organisms so that only the mutant sequences are detected and tested by the assay to be controlled.

Two permanent cell lines have been reported that have not modified the number of chromosomes or spontaneously rearranged these chromosomes in reciprocal translocations, inversions, or insertions. A male and female cell line (GM130 and GM131) have been found to maintain a very stable chromosome complement typically through four months of continuous suspension culture (Lebo R V, Bruce B D: Gene mapping with sorted chromosomes. Methods Enzymology 151:292-313, 1987; Lebo R V: Gene mapping strategies and flow cytogenetics. In: Gray J W (ed) Flow Cytogenetics. Academic Press, New York, Chapter 14, pp 225-242, 1989, Incorporated herein by reference). Because the male cell line is hemizygous for the sex chromosomes including the HGPRTase gene that can be selected for and against with different chemicals, this represents a ideal target for gene knockout and introduction of multiple mutations.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention may take physical form in certain parts and arrangement of parts, a preferred embodiment of which will be described in detail in this specification and illustrated in the accompanying drawings which form a part hereof and wherein:

FIG. 1 is an illustration of nine clones.

FIG. 2 illustrates the standard method of mutant sequence synthesis is to order and select primer pairs and set up PCR amplification reactions

FIG. 3 is a CRE LOX strategy to create homozygous deletion of CFTR.

DESCRIPTION OF THE INVENTION One Embodiment

A. Multiple controls sequences synthesized according to (U.S. patent application Ser. No. 10/236,168, Lebo R V, Milunsky A, Wang Z, Yamin M. Synthesized Multiple Control Sequences Optimize Quality Control, 10/487,234, both of which are incorporated herein by reference, can be cloned before or after further splicing into vectors with selectable genes and gene sequences that enhance insertion of the vector into a host genome and selection for the cells that retain these inserted sequences. In fact, “hit and run” vectors have been used to construct mutant mouse strains that mimic human genetic disease phenotypes. When the vector with its partial gene sequence recombines precisely at the homologous gene sequences within the target cell genome (the hit), sequences from the vector including its selectable gene markers and promoters as well as the DNA fragment(s) cloned into it are all incorporated into the genomic site homologous to the gene sequence cloned into the vector. This now introduces another control gene copy with the mutant, polymorphic, or variant sequences inserted by PCR or site-directed mutagenesis along with a selectable gene that allows only the cells with the targeted recombinant construct to survive in selectable medium.

When two homologous sequences become juxtaposed (recombine adjacent to each other) during the hit, these form double strands by looping to bring the sequences together, denaturing the double DNA strands, and reannealing two different strands, then breakage and recombination can occur to excise the inserted vector and selectable gene markers (the run) and leave behind either the original gene sequence or the mutated introduced gene sequence. In the second scenario, the gene on one chromosome is mutated. This would be sufficient to produce a homozygous mutant of an X-linked gene in a male organism. This approach can be used to synthesize multiple homozygous mutant controls for hemizygous X-linked genetic disease genes in male cells.

Alternatively, if this procedure is followed for an autosomal gene in a diploid organism with one autosome from each parent carrying one gene copy, then a normal individual's cells with two normal gene copies will be converted to a heterozygous cell with one abnormal gene segment in one gene and one normal gene segment in the unchanged gene on the other chromosome. When this procedure is completed in mouse cells, then injecting these heterozygous mutant cells into mouse blastocysts from a mouse with a different coat color, and implanting the blastocyst (embryo) into fertile hormone-prepared female mice often results into an implanted chimeric zygote that grows along with other littermates and is born as a chimeric mouse that can be identified by its coat color arising from cells not only from the donated blastomere but also cells from the parent cell line carrying the mutated genes. When this mouse is bred to normal mice, the mice with germ line cells that have the mutant genes transmit the mutant sequences to the offspring. In this fashion offspring heterozygous for the gene can be isolated. In this fashion heterozygous controls with the inserted gene mutations can be obtained, grown in large quantity as whole organisms or in cell culture, and the DNA harvested and marketed. This approach can be applied to prepare multiple heterozygous mutant, variant, or polymorphic controls for an autosomal genetic disease gene or locus.

The targeted gene does not need to be the assayed gene or homologous gene modified with mutant sequences. Rather, any gene that is propagated regularly on all the chromosomes can be targeted. For instance, X chromosomal genes in male cell lines are quite good targets because loss of the X chromosome will most likely result in cell death because many X-linked genes are required for normal cell growth.

Alternatively, knocking out the normal HGPRTase gene activity leaves the cell resistant to azaguanine. Therefore placing a segment of the HGPRTase gene in the vector carrying the mutant gene sequences to be controlled provides one additional selectable gene to deliver an independent control sequence to male cells.

Every time another gene fragment with mutant sequences is to be introduced independently, another selectable gene marker may be required. These may be in cells with mutant genes that when the construct is inserted the gene becomes normal as in the HGPRTase gene. At the same time, the number of fragments to be incorporated with individual selectable gene markers can be minimized by splicing as many gene fragments together as possible according to 10/487,234 which taught the synthesis of multiple controls.

It is noted that producing and selecting for cells that can be grown in very large quantities may be sufficient to produce a reliable source of DNA for distribution to all laboratories desiring these controls. It is also noted that the term “mutant” when used to describe a DNA sequence, may be replaced by the term “variant” indicating a normal or abnormal change occurring in less than 1% of individuals, or “polymorphism” which is a DNA change that is found in >1% of all individuals of that species.

After these heterozygous mice are identified by DNA analysis, two heterozygous individuals are bred and 1 in 4 offspring are homozygous for the mutant cystic fibrosis alleles. Thus homozygous cell complements of the mutated gene can be isolated. This is one additional method to obtain homozygous mutant cells from which to isolate DNA to be used for multiple controls.

B. An alternative means to obtain a cell line with equal numbers of mutant and normal sequences is to construct a hit and run vector with two mutated sequences inserted instead of one. This can be done readily by merely adding selected appropriate sequences at the 5′ ends of the PCR primers. In this fashion, after a single DNA sequence has been amplified with one or more mutant sequences, then two additional PCR amplifications are added. In one embodiment, two small aliquots of the same PCR amplified DNA segment are amplified. The upstream primer pairs are different from each other so that the final products can be distinguished. For instance, the primers may have two different restriction enzyme sites with a few flanking basepairs added. Each downstream primer will not only have the original primer sequence but additional sequences about 22 basepairs long that will be reverse complements of the primer used in the other PCR reaction. PCR amplification will then produce two different products that can be mixed and again amplified using only primers with restriction enzyme sites. Now one fragment about twice as long as the two original PCR products can be amplified that for the most part is a palindrome in a single strand. Restriction enzyme digestion of both ends can produce a fragment with different sticky ends that can be directionally cloned into a vector with a cloning site that has both restriction enzyme sites. It is noted that a vector with multiple rare or common cloning sites may be constructed as taught in U.S. patent application Ser. No. 10/236,168 Lebo et al.

Now using this vector with two copies of the multiple abnormal controls inserted, the “hit” portion of the procedure described in Section IIA can be completed. This will insert two abnormal sequences into a gene with two originally normal alleles, one on each homologous chromosome inherited from each parent. This results in a single cell with equal numbers of normal and mutant allelic target sequences (two of each). This cell, chosen for its known long term propagation characteristics [add cell line used in chromosome sorting] will produce a renewable source of heterozygous DNA controls that can be used as a control to compare and quantify the target sequences. This solves the problem of constructing a heterozygous control with the same number of normal and abnormal sequences in the same cell line.

C. Preparing a cell line with homozygous mutant controls requires (1) synthesizing and splicing the controls into a selectable vector, adding another homologous gene sequence found only in the cell line of the unrelated species from which the target cell line is derived, and placing the multiple mutations into this cell line. (2) Alternatively, a cell line with a deletion of a sex-linked gene or a cell line with two (homozygous) deletions of the gene for which controls are being prepared will serve as a suitable alternative. However, this will typically not be available for all genes to be controlled. (3) Yet another means to obtain a cell line with only homozygous mutations is to insert a vector with two mutant sequences into the X chromosome, hybridize the human cell to a rodent cell, and select for the retention of the human X chromosome as the other chromosomes are lost randomly during cell division.

The first step is to select a robust cell line from a species that is sufficiently distant evolutionarily so that the genes with similar functions do not interfere with the assay for which the control is being developed. For instance, salmon sperm DNA has typically been selected to be added to cDNA hybridization mixtures when first used for Southern Blot Restriction Enzyme Analysis in order to bind most of the nonspecific binding sites on the nitrocellulose or nylon filter to which the restriction enzyme digested DNA had been previously bound (Maniatis). Then the specific binding of the radiolabeled globin gene probe could be readily discerned over the remaining nonspecific binding of the radiolabeled probe to the filter. Thus salmon sperm genomic DNA is quite suitable to serve as carrier DNA for globin gene mutation analysis. DNA from multiple organisms can be tested to determine the most suitable for other tested genes such as drosophila (or any other insect), chicken (bird), bat or whale (mammal), yeast (single cellular plant), or bacteria. For applications with RNA, yeast or mammalian placental tRNA or mRNA are two alternative carrier sources. It is noted that mammals include multiple orders including man, mice, bats, elephants, bear, sea lions, cats, whales, cattle and horses (reference). Total DNA (or RNA) needs to be tested in parallel with any other normal DNA from the species being assayed and found to give no competing signal that can be confused with the signal from the sample being tested. This application requires a whole living cell that can be cultured readily in the laboratory including any of those selected above.

The robust cell line can be established from a skin biopsy of the organisms required or can be purchased from cell repositories, so long as these repositories do not prohibit modifying the cell line to be more useful and then distributing products derived from it. Immortalized cell lines can only be used for this purpose. However, if the means to immortalize them results in unstable chromosome complements that can change the relative ratios of introduced recombinant sequences to the total genomic DNA, then each batch of prepared controls must be verified for all the controls grown in the cell line. This comparison needs to be compared with an aliquot of DNA isolated from an earlier culture that was judged to be suitable for the application as well as any other suitable normal DNA controls. At the same time, some permanent cell lines have stable chromosome complements like the 46,XX and 46,XY cell lines (GM and GM) used for chromosome sorting. This selection merely required monitoring the suitability of the permanent cell line which can have many frozen aliquots of cells found to be suitable that can be thawed quickly whenever the current cell line might evolve to become unusuable. The number of target sites compared to the number and identity of the chromosomes in the cell line can be monitored readily by fluorescence in situ hybridization using the vector and inserts subsequently placed into the selected cell line.

The next step is to prepare the multiple mutant controls spliced together into a minimal number of DNA fragments. These fragments are then each introduced into a selectable vector along with a short unique gene sequence in the chosen target cell line. Now each vector is introduced into the chosen target cells, one vector per target cell culture, and selected for the retention of the gene on the vector. Selectable gene markers typically result in only the cells with the expressed gene growing well in the culture medium. Cells are then selected from each target culture to find recombinant daughter cells that retain the delivery vector and the mutant gene sequences. Fluorescence in situ hybridization is one means to determine the number of copies of vector are incorporated into each cell line and the individual target chromosome location(s). Typically a cell line carrying one vector will be the construct of choice for mutation control production. If 25 mutations are cloned into 8 fragments, then a target cell line can be used with about one-eighth as much DNA as the tested species to be controlled. Then eight cell lines can be grown, the DNA quantified, and equal quantities of DNA isolated from each cell line can be mixed equally, and provided as a control to laboratories requesting this material. Alternatively, cells from organisms with even less DNA can be mixed with other DNA from the same organism that does not have mutant control sequences in order to provide total genomic DNA with the correct number of mutant DNA sequences for comparison to the total DNA in each cell of the tested species.

D. Artificial centromeres have been constructed from cloned arrays of the 171 bp common centromeric repeat in human chromosome centromeres. The DNA in these artificial centromeres is duplicated along with the DNA in the other chromosomes. During the subsequent mitotic cell division the centromeric repeats segregate to each daughter cell with a high degree of reliability for many cell generations. Therefore these structures have been proposed as a means to introduce whole normal genes into cells lacking active genes in order to cure human genetic disease. This communication teaches that the same protocol can be used to introduce synthesized mutant gene fragments into previously constructed artificial chromosomes. When selectable gene markers are simultaneously introduced, the recombinant centromere can then be selected so that the cells with the centromere outgrow the other cells and take over the culture.

Because artificial centromeres segregate with other chromosomes, larger amounts of genomic sequences can be added with an ever increasing number of selectable genes. In this fashion, two or more mutant DNA sequences could be inserted. Furthermore, the length of these sequences would be much greater. For instance, if one wanted to add the entire coding sequence of a long gene like dystrophin (14 kb), this should be possible without exceeding the carrying capacity of the artificial centromere. Therefore many control sequences for many different genes could be incorporated into the same artificial centromere. Then fewer and fewer cell lines will need to be constructed and marketed in order to control for the ever expanding repertoire of reported disease gene loci and their common mutations. This can be seen to be quite useful for controls required for microchips that can test hundreds to thousands of individual DNA sequences simultaneously.

Alternative Embodiments

As described in the U.S. patent application Ser. No. 10/236,168, controls can be synthesized by PCAR as well as by site-directed mutagenesis. Other embodiments can be used to synthesize the multiple mutation controls including synthesis of DNA fragments that themselves act as controls or test primers, or act as PCR primers that can be annealed and extended by DNA polymerase to obtain ever longer DNA fragments for subsequent use or manipulation according to the protocols taught by the original filed document and expounded upon by this filing.

At the same time include synthesizing controls sufficiently long to act as controls for restriction enzyme analysis. This may simply represent specific single or multiple nucleotide changes abolishing or creating a restriction enzyme site, or modifying PCR primers so that a restriction enzyme site is created in only one of amplified allelic targets at a specific site. At the same time, this might be used to prepare sequences that are typically longer than PCR amplified fragments for use as controls for restriction enzyme analysis with or without Southern blotting. Southern blot analysis is often replaced by PCR when testing routine RFLP analysis. However, restriction enzyme analysis may be required to detect (1) very long trinucleotide repeat expansions or (2) methylated sequences reflecting inactivation of genes by imprinting, X inactivation, or other gene-specific modification of gene expression for which restriction enzyme digestion with or without Southern blotting is required.

CONCLUSIONS, RAMIFICATIONS, AND SCOPE

U.S. application Ser. No. 10/236,168 (which is incorporated entirely, including definitions, herein) taught that every one of the 100 selected cystic fibrosis mutations could be synthesized readily, even when patient material was unavailable and only the published sequence was available. This even included a 10 kb deletion that was synthesized by judicious selection of primers that independently amplified the upstream and downstream sequences independently before splicing the two together, even though the original amplified sequences were 10,000 basepairs apart. As taught herein, these amplified sequences that may be cloned can be diluted to reflect samples like the unknown genomic sequences to be tested by the DNA assay. Dilution not only can be completed with any RNA or DNA that is sufficiently different at all loci so as not to interfere with the assay, but the sequences can also be introduced into living cells so that these are propagated simultaneously with the other genomic DNA in the proportions of the gene loci and total DNA being tested by the assay. Furthermore, these multiple controls can not only be introduced into chromosomes found in any of the plant or animal species whose cells can be cultured in the laboratory, but artificial chromosomes for selected often tested species can be synthesized and used to carry many fragments of control gene sequences.

Also taught herein is how to distinguish multiple controls that are added to the assay at the same time or a different time as the total unknown DNA to be tested in the same reaction mixture. These can be distinguished by labeling the DNA with different reporter molecules with readily detectable characteristics that do not overlap with the characteristics of the unknown reporter labeling the sequences tested from the unknown DNA sample. This can be major differences in fluorescence like infrared labeled control sequences and fluorescein labeled unknown sequences. It includes described and yet to be described reporter molecules like those that phosphoresce and those that use dyes within the visible and invisible spectrum of wavelengths.

Finally, taught herein is how to prepare viruses and produce RNA noninfectious products for comparison to RNA viruses as well as to RNA gene products.

Taken together, the ramifications of multiple controls maybe to prepare and use these reagents wherever they are needed with the available technology in order to improve quality control of manufactured laboratory test formats, laboratory prepared reagents (“home brew”), and incomplete or complete manufactured test kits. These improvements will impact quality assurance programs which can use these reagents to test laboratory proficienty as well as to monitor the laboratories' self monitoring and reliability during inspections and according to other requested materials. Best of all, the consumer will be assured of the continuing optimization of all reported laboratory tests of RNA and DNA so that the decisions made based upon the results can be made with a greater degree of comfort that the data upon which these are based is being optimized as thoroughly as possible.

The scope of this patent may include all living organisms having DNA and/or RNA as their genetic code including viruses, basteria, plants, and animals. Pitomal controls for polymorphic lengths will result in highly reliable DNA “fingerprints” at multiple polymorphic loci so that criminals who are arrested for an unrelated crime can be connected to other crime scenes or victims even through the indicent was months or years earlier (Lebo R, Maher T, Farrer L, Yosunkaya Fenerci E, Milunsky J. Highly polymorphic short tandem repeat analyses clarify complex molecular test results. Diagnostic Molecular Pathology, 10(3):179-189, 20010, Incorporated herein by reference. Optimal trinucleotide repeat controls can be used to readily distinguish normal, gray zone, and affected categories for genetic diseases arising from these mutation categories. Using highly reliable polymorphic controls to measure small nucleotide repeats, the identity and relationship of living persons in subsequent generations can be compared to records of DNA results obtained in this generation with numerous applications.

DEFINITIONS

The following patents and patent applications are incorporated entirely herein by reference: U.S. Pat. Nos. 5,876,927, 5,723,593, 5,654,148, 5,665,540, and U.S. patent application Ser. Nos. 10/236,168 and 60/625,863.

What is Taught Herein Includes

A method of optimizing quality control for testing specific nucleotide sequences comprising the steps of: testing for the presence of normal, mutant, and/or polymorphic nucleotide sequences at a defined genetic locus; and confirming that there is no interference by other homologous (similar) sequences at different loci.

The genetic test control of claim 1 further comprising the isolation, propagation, and analysis of cloned homologous genomic DNA segments (including pseudogenes) that could interfere with analysis of total genomic mutant, polymorphic or variant sequences in the subjects tested DNA.

The genetic test control may further include the isolation, propagation, and analysis of DNA from cell lines with homozygous deleted gene region(s).

The genetic test control may also include the isolation, propagation, and analysis of DNA from cell lines with heterozygous deleted gene region(s).

The genetic test control may alternatively include the isolation, propagation, and analysis of DNA from constructed lines with multiple homologous, mutant, or variant control sequences inserted into one or more whole chromosome sites in engineered or normal cells.

The genetic test control of claim 1 further comprising the isolation, propagation, and analysis of DNA from a cell line with an artificial chromosome carrying multiple homologous, mutant, or variant control sequences that were inserted into dividing cells that duplicate the artificial chromosome with its own genome during the cell cycle and segregate duplicate copies of both into daughter cells at mitosis.

The genetic test control of claim 1 further comprising the isolation, propagation, and analysis of RNA or DNA from infectious control sequences from claims 1 or 2 selected from infectious RNA from organisms that are reverse transcribed, cloned, propagated in DNA vectors, and transcribed (converted back) to RNA when initiated by adjacent promoter sequences like T7 and T3.

Multiple control sequences from claims 1-6 for common mutations in genetic diseases including inherited diseases that are autosomal dominant, autosomal recessive, maternally inherited in the mitochondrial genome, paternally inherited in the Y chromosome, or acquired genetic disease like cancer.

Multiple control sequences for polymorphisms from claims 1 to 7 that distinguish the individual, relationship of purported relatives, species, or infectious disease organism or strain from which the DNA is derived.

The genetic test control of claim 1 further comprising any homologous genomic DNA or RNA test control without sequences from the specifically tested genetic locus sequence to preclude detection of another genomic locus by a specific assay.

The genetic test control of claims 1 and 2, or 3, or 4, or 5, or 6, or 7, or 10 further comprising: a second homologous sequence having sufficient length to be tested by the assay and containing sufficiently homologous sequences to require test comparison to at least a first mutation being tested.

A genetic test control comprised of: one or more homologous sequences sufficiently long to span the homologous portion of the locus tested and to confirm that it is not being detected in the assay because of nucleotide sequence change(s) that are not found in the mutant or normal sequence.

The genetic test control of claim 15, further comprising: a homologous sequence having sufficient length to be tested by the assay and containing at least a first mutation being tested.

The genetic test control of claim 16, wherein the number of copies of the one or more homologous sequences sufficiently long to span the homologous portion of the locus tested is substantially the same as the number of copies of the second genetic sequence.

The genetic test control of claim 16, wherein the second genetic sequence has at least three mutations.

The genetic test control of claim 18, wherein the number of copies of the one or more homologous sequences sufficiently long to span the locus tested is substantially equal to the number of copies of the second genetic sequence.

The genetic test control of claim 18, wherein the number of copies of the one or more homologous sequences sufficiently long to span the homologous portion of the locus tested is substantially, approximately one half of the number of copies of a gene locus being tested.

A genetic test control, the control comprising: a first portion, wherein the first portion is comprised of one or more homologous sequences sufficiently long to span a portion of the locus tested and to confirm that it is not being detected in the assay because of its nucleotide sequence change(s) that are not found in the tested mutant or normal sequence; and a second portion, wherein the second portion is a tested genetic sequence having sufficient length to be detected by the assay and containing at least a first mutation being tested.

Mixtures of multiple controls from claims 1 and 2, or 3, or 4, or 5, or 6, or 7, or 10, or 21 can be prepared and marketed as a part of or to complement independently manufactured disease testing kits

The genetic test control of claims 3 or 4 in which selected minimal length gene segments of gene regions of target genes are deleted using O-type or □-type insertion and deletion events to optimize the efficiency of the process.

The genetic test control of claim 23 in which cell lines with one of one, one of two, or two of two target gene regions may be selected as starting material to minimize the manipulation required to obtain a hemizygous, heterozygous, or homozygous deletion of a selected gene region or regions of 1 or more genes.

The genetic test control of claim 23 in which the cell with selected gene deletions, polymorphisms, or other rearrangements can be or has been immortalized using standard vectors like SV-40 for fibroblasts and Epstein-Barr virus for B-lymphocytes.

The genetic test control of claims 1 or 3 or 4 or 23 in which cell lines spontaneously transformed in culture such as B-lymphocytes cultured from patients previously infected with Epstein-Barr virus.

The genetic test control of claim 23 in which cloned or artificially constructed sequences with mutant, polymorphic or normal sequences can be inserted into cellular genomic DNA using O-type or □-type insertion events in cell lines prepared according to claim 1 or claim 2, or both claims 1 and 2.

The genetic test control of claim 1, or 3, or 4 in which selectable genes are used to allow recombinant cells to grow while inhibiting cells to grow without the desired recombinant gene such as the neomycin resistant gene (FIG. 3.4, Kresina T F, Molecular Medicine and Gene Therapy, Wiley-Liss, New York, 2001, pp. 55), the APRTase gene, or the HGPRTase gene to select for events in claim 1, claim 3, or claims 1 and 3, or claim 23 or 27.

The genetic test control of claim 1, or 3, or 4 in which the location of each sequence to be deleted by recombination will be selected to flank the region(s) to be controlled in both upstream and downstream positions so long as recombination at these locations remove the selected gene sequences to be to be controlled for more than one gene mutation or polymorphic sequence.

The genetic test control of claim 1, or 3, or 4 in which a single nucleotide substitution or any desired sequence with more than one mutation can be completed by the typical O-type insertion and excision of a “hit and run” vector and would not require deletion of a gene region.

The genetic test control of claim 1, or 3, or 4 in which one or more mutant, polymorphic, or variant gene sequences can be inserted into any targeted site on any chromosome including the hemizygous X-chromosome in a male cell line so long as the upstream and downstream flanking sequences are homologous to the targeted site to be recombined.

The genetic test control of claim 1, or 3, or 4 in which the multiple targeted sequences can be in any order without regard to the natural order found in genes to minimize the total length of the inserted DNA and maximize the utility of the recombinant cell line.

The genetic test control of claim 4 in which an individual is identified with the desired single mutation on both autosomal gene sequences.

The genetic test control of claim 4 in which an individual is identified with a homozygous deletion of the gene the homozygous mutation is desired and inserting 1 or 2 copies of the mutation anywhere in the genome (one copy for hemizygous and 2 copies for homozygous).

The genetic test control of claim 4 in which an individual is identified with one deleted gene region, gene targeting the intact gene and using a hit and run vector to add 1 mutant gene copy for a hemizygous mutation and 2 mutant gene copies for a homozygous mutation.

The genetic test control of claim 4 in which constructing a cell line with two gene regions deleted from both homologous autosomes carrying the gene prior to reinserting 1 or 2 mutant sequences to be tested.

The genetic test control of claim 4 using an evolutionarily distant eukaryotic cell line with no homologous gene to the desired sequence and adding 1 or 2 mutant sequences at any selected genomic sequence.

The genetic test control of claim 3 in which identifying an individual with a homozygous deletion of the gene region where the homozygous mutations are desired and inserting 1 or 2 copies of the mutation anywhere in the genome (one copy for hemizygous and 2 copies for homozygous).

The genetic test control of claim 3 in which identifying an individual with one deleted gene region, gene targeting the intact gene and using a hit and run vector to add 2 or more gene mutations in 1 fragment for a hemizygous mutation and in 2 mutant fragments for a homozygous mutation.

The genetic test control of claim 3 in which identifying an individual with two gene regions deleted from both homologous autosomes carrying the gene prior to reinserting 2 or more mutant sequences to be tested in 1 fragment for a hemizygous mutation and in 2 mutant fragments for a homozygous mutation.

The genetic test control of claim 3 using an evolutionarily distant eukaryotic cell line with no homologous gene to the desired sequence and adding 2 or more mutant sequences at any selected genomic sequence.

The genetic test control of claim 1 using hemizygous mutations that may be preferable to homozygous mutations because some assays quantify the gene target while others only test for the presence of the mutant, polymorphic, or normal allelic sequence.

The genetic test control of claims 1 or 42 in which control sequences can be inserted into unique gene sites like the HGPRTase locus on the X chromosome of a hemizygous male.

This application teaches cloning multiple control sequences into vectors that can be incorporated into artificial chromosome centromeres or into chromosomal DNA targets in normal or abnormal chromosomes in the parent cell line and selecting for the cells with incorporated sequences. Constructed cell lines grow and divide to maintain the ratio of multiple control DNA sequences to total genomic DNA sequences. DNA from these transformed cell lines is grown, extracted, quantified, and used directly. One application is to test the homozygous mutant and heterozygous controls from the beginning of the assay along with the unknown samples in two separate control tubes. Controls distinguished by different reporter molecules (Next section) can also be added to unknown samples at multiple points during the assay.

SUMMARY OF RELATED ART

Currently available controls for genetic and infectious disease testing are rare resources that might be obtained in small quantities from very cooperative laboratories already doing the test. These materials are typically in such short supply because these samples represent the remaining nucleic acid samples following disease testing of a small tissue or body fluid aliquot obtained and submitted to the laboratory from a patient being tested for a suspected genetic disease gene or for an infectious disease. Furthermore, laboratories often require signed consent forms or other carefully followed protocols in order to use these excess materials during subsequent testing of patient samples. After validating the test, the laboratory typically categorizes and stores these control stocks so for ready access and use whenever required. When a testing laboratory exhausts its original supply, it attempts to replace it with nucleic acid sequences from another unknown sample that has been tested and found to have the desired sequence since introducing the test. Another approach for cooperating laboratories has been to submit patient lymphocyte samples with abnormal nucleic acid sequences to Coriell Institute or the American Type Culture Collection for maintenance and distribution to qualified laboratories doing additional disease research or testing. The patient lymphocytes are then transformed (immortalized) with Epstein-Barr virus so that the cell line can be propagated nearly indefinitely. At the same time, selected infectious disease agents can be purchased through factilites that routinely deal with biohazard and distribution. Typically, federal contracts, grants, and user fees maintain these resource facilities.

Although these facilities complete a very important function, the characteristics of the controls provided in the cell lines may not be optimal for each assay in the available deposits that are distributed. For instance, one excellent control for an autosomal recessive genetic disease assay that simultaneously tests the normal sequence and the abnormal sequence at 25 different gene sites is one or a few cloned DNA fragments of sufficient length spliced in any order with each mutation to be tested and no normal sequences at any of the mutation sites tested in any cloned fragment. A second control would be a normal control DNA with no mutations tested in a separate reaction mixture. A third control could be heterozygous sequences with equal numbers of normal and mutant sites to determine whether the number of each are quantified correctly in any unknown sample. However, only using a transformed cell line that is from a heterozygous R560T carrier would represent a suboptimal test control because this could not be used to determine whether the assay can distinguish between a homozygous and a heterozygous patient sample. In fact, during the development of cystic fibrosis controls to enable this patent, at least one mutation site tested by the multiple mutation test strip provided by one large manufacturer could not distinguish between one abnormal and one normal heterozygous genotype and a genotype with two homozygous mutation sites. Therefore the sequences at this location had to be redesigned in the test assay to correct the deficiency. Furthermore, not all the mutations are available for each desired test. For this reason 4 mutations were unavailable to laboratories purchasing patient cell lines from Coriell Institute and also purchasing commercial cystic fibrosis test kits. At the same time, the cells purchased had very few homozygous mutant controls which could not test whether the test was distinguishing between a homozygote and a heterozygous patient sample. In contrast, all four of these mutations could readily be synthesized as homozygous or heterozygous controls using PCR or site directed mutagenesis as taught in our prior patent submission.

Even if 25 cell lines with homozygous cystic fibrosis mutations at all the gene sites tested had been available, these together would not have provided an optimal control because each cell line would also have homozygous normal sequences at all the other gene loci tested. Therefore only one homozygous mutant site could be tested with each available control cell line to determine whether the assay was sufficiently specific to distinguish a homozygous from a heterozygous patient sample, and each test would have required its own cystic fibrosis test strip. Furthermore, DNAs from multiple cells lines could not have been mixed as the number of normal allelic sites would outnumber the mutant allelic sites and the signal from the normal sequence would typically have been substantially more intense than the abnormal test signal. Because the cost of an individual test strip represents a major proportion of the reimbursed test fee, testing each available homozygous mutation along with each available heterozygous mutation along with unknown samples in every assay becomes prohibitively expensive. Therefore some laboratories have been rotating different subsets of unknown mutations when completing individual assays. This would not detect test failure for the mutations without suitable test controls.

In contrast, laboratories have synthesized 33 homozygous controls on about 17 gene fragments. Since then weekly PCR assays amplified all these 33 mutations in three different tubes, and tested the amplified products on two test strips. It is also known to synthesize and test 100 heterozygous mutation controls that represent the optimal means to control mass spec analysis of each of these mutations independently or in selected multiplex reactions. The reason the heterozygous control is optimal for mass spec is because the peak locations reflecting the molecular time-of-flight for both mutations is observed in the control tube which serves as an additional control for each independent mass spec analysis.

Artificial chromosomes have been constructed by using centromeric sequences of 171 bp repeats of monomeric units usually attached head-to-tail in long arrays (Willard, H. F. (1991) Evolution of Alpha Satellite. Current Opinion in Genetics and Development 1:509-514.) One means to construct artificial chromosomes is by adding cloned repeated centromeric sequences to cells (Willard H F. (1998) Centromeres: The missing link in the development of human artificial chromosomes. Curr Opin Genet Dev 8(2):219-225; Grimes B R, Rhoades A A, Willard H F. (2002) Alpha-satellite DNA and vector composition influence rates of human artificial chromosome formation. Mol Ther 5(6):798-805; Choo, K. H. A. (1997) The Centromere. Oxford University Press. These published reports are incorporated in entirety by reference). Additional selected control sequences can be added to artificial chromosomes analogous to the protocols used to add genes intended to correct genetic defects. These means may include selectable gene markers without which the host cell cannot survive when grown in selectable conditions because selectable genes provide resistance to the selecting agent.

At the same time, mouse models of human genetic disease have been constructed using Hit and Run vectors. Circular vectors with unique genomic target DNA typically including a mutation along with a selectable gene can be inserted into a host cell genome by homologous recombination. The cells with the incorporated recombinant selectable genes are then grown in medium that kills the cells without the selectable gene. The remaining survivors carry the inserted vector, typically at the targeted site. This method provides a means to introduce homozygous mutations into (1) normal cells from any normal individual in the species to construct heterozygous cell lines, and (2) cell lines from distantly related organisms so that only the mutant sequences are detected and tested by the assay to be controlled.

A. Multiple controls sequences synthesized according to the first submitted patent can be cloned into vectors with selectable genes and gene sequences that enhance insertion of the vector into a host genome and selection for the cells that retain these inserted sequences. In fact, “hit and run” vectors have been used to construct mutant mouse strains that mimic human genetic disease phenotypes. When the vector with its partial gene sequence recombines precisely at the homologous gene sequences within the target cell genome (the hit), sequences from the vector including its selectable gene markers and promoters as well as the DNA fragment(s) cloned into it are all incorporated into the genomic site homologous to the gene sequence cloned into the vector. This now introduces another control gene copy with the mutant, polymorphic, or variant sequences inserted by PCR or site-directed mutagenesis along with a selectable gene that allows only the cells with the targeted recombinant construct to survive in selectable medium.

When two homologous sequences juxtaposed to each other during the hit form double strands by looping to being the sequences together, denaturing the double DNA strands, and reannealing two different strands, then breakage and recombination can occur to excise the inserted vector and selectable gene markers (the run) and leave behind either the original gene sequence or the mutated introduced gene sequence. In the second scenario, the gene on one chromosome is mutated. This would be sufficient to produce a homozygous mutant of an X-linked gene in a male organism. This approach can be used to synthesize multiple homozygous mutant controls for hemizygous X-linked genetic disease genes in male cells.

Alternatively, if this procedure is followed for an autosomal gene in a diploid organism with one autosome from each parent carrying one gene copy, then a normal individual's cells with two normal gene copies will be converted to a heterozygous cell with one abnormal gene segment in one gene and one normal gene segment in the unchanged gene on the other chromosome. When this procedure is completed in mouse cells, then injecting these heterozygous mutant cells into mouse blastocysts from a mouse with a different coat color, and implanting the blastocyst (embryo) into fertile hormone-prepared female mice often results into an implanted chimeric zygote that grows along with other littermates and is born as a chimeric mouse that can be identified by its coat color arising from cells not only from the donated blastomere but also cells from the parent cell line carrying the mutated genes. When this mouse is bred to normal mice, the mice with germ line cells that have the mutant genes transmit the mutant sequences to the offspring. In this fashion offspring heterozygous for the gene can be isolated. In this fashion heterozygous controls with the inserted gene mutations can be obtained, grown in large quantity a whole organisms or in culture, and the DNA harvested and marketed. This approach can be applied to prepare multiple heterozygous mutant, variant, or polymorphic controls for an autosomal genetic disease gene or locus.

The targeted gene does not need to be the assayed gene or homologous gene modified with mutant sequences. Rather, any gene that is propagated regularly on all the chromosomes can be targeted. For instance, X chromosomal genes in male cell lines are quite good targets because loss of the X chromosome will most likely result in cell death because many X-linked genes are required for normal cell growth.

Alternatively, knocking out the normal HGPRTase gene activity leaves the cell resistant to azaguanine. Therefore placing a segment of the HGPRTase gene in the vector carrying the mutant gene sequences to be controlled provides one additional selectable gene to deliver an independent control sequence to male cells.

Every time another gene fragment with mutant sequences must be introduced independently, another selectable gene marker will be required. These may be in cells with mutant genes that when the construct is inserted the gene becomes normal as in the HGPRTase gene. At the same time, the number of fragments to be incorporated with individual selectable gene markers can be minimized by splicing as many gene fragments together as possible according to the prior patent submission which taught the synthesis of multiple controls.

After these heterozygous mice are identified by DNA analysis, two heterozygous individuals are bred and 1 in 4 offspring are homozygous for the mutant cystic fibrosis alleles. Thus homozygous cell complements of the mutated gene can be isolated. This is one additional method to obtain homozygous mutant cells from which to isolate DNA to be used for multiple controls.

B. An alternative means to obtain a cell line with equal numbers of mutant and normal sequences is to construct a hit and run vector with two mutated sequences inserted instead of one. This can be done readily by merely adding selected appropriate sequences at the 5′ ends of the PCR primers. In this fashion, after a single DNA sequence has been amplified with one or more mutant sequences, then two additional PCR amplifications are added. In one preferred embodiment, two small aliquots of the same PCR amplified DNA segment are amplified. The upstream primer pairs are different from each other so that the final products can be distinguished. For instance, the primers may have two different restriction enzyme sites with a few flanking basepairs added. Each downstream primer will not only have the original primer sequence but additional sequences about 22 basepairs long that will be reverse complements of the primer used in the other PCR reaction. (Figure X). PCR amplification will then produce two different products that can be mixed and again amplified using only primers with restriction enzyme sites. Now one fragment about twice as long as the two original PCR products can be amplified that for the most part is a palindrome in a single strand (Lebo et al., 1983). Restriction enzyme digestion of both ends can produce a fragment with different sticky ends that can be directionally cloned into a vector with a cloning site that has both restriction enzyme sites. It is noted that a vector with multiple rare or common cloning sites can be constructed as taught in U.S. patent Ser. No. 10/236,168.

Now using this vector with two copies of the multiple abnormal controls inserted, the “hit” portion of the procedure described in Section IIA can be completed. This will insert two abnormal sequences into a gene with two originally normal alleles, one on each homologous chromosome inherited from each parent. This results in a single cell with equal numbers of normal and mutant allelic target sequences (two of each). This cell, chosen for its known long term propagation characteristics [add cell line used in chromosome sorting] will produce a renewable source of heterozygous DNA controls that can be used as a control to compare and quantify the target sequences. This solves the problem of constructing a heterozygous control with the same number of normal and abnormal sequences in the same cell line.

C. Preparing a cell line with homozygous mutant controls requires (1) synthesizing and splicing the controls into a selectable vector, adding another homologous gene sequence found only in the cell line of the unrelated species from which the target cell line is derived, and placing the multiple mutations into this cell line. (2) Alternatively, a cell line with a deletion of a sex-linked gene or a cell line with two (homozygous) deletions of the gene for which controls are being prepared will serve as a suitable alternative. However, this will typically not be available for all genes to be controlled. (3) Yet another means to obtain a cell line with only homozygous mutations is to insert a vector with two mutant sequences into the X chromosome, hybridize the human cell to a rodent cell, and select for the retention of the human X chromosome as the other chromosomes are lost randomly during cell division.

The first step is to select a robust cell line from a species that is sufficiently distant evolutionarily so that the genes with similar functions do not interfere with the assay for which the control is being developed. For instance, salmon sperm DNA has typically been selected to be added to cDNA hybridization mixtures when first used for Southern Blot Restriction Enzyme Analysis in order to bind most of the nonspecific binding sites on the nitrocellulose or nylon filter to which the restriction enzyme digested DNA had been previously bound (Maniatis). Then the specific binding of the radiolabeled globin gene probe could be readily discerned over the remaining nonspecific binding of the radiolabeled probe to the filter. Thus salmon sperm genomic DNA is quite suitable to serve as carrier DNA for globin gene mutation analysis. DNA from multiple organisms can be tested to determine the most suitable for other tested genes such as drosophila (or any other insect), chicken (bird), bat or whale (mammal), yeast (single cellular plant), or bacteria. For applications with RNA, yeast or mammalian placental tRNA or mRNA are two alternative carrier sources. It is noted that mammals include multiple orders including man, mice, bats, elephants, bear, sea lions, cats, whales, cattle and horses (reference). Total DNA (or RNA) needs to be tested in parallel with any other normal DNA from the species being assayed and found to give no competing signal that can be confused with the signal from the sample being tested. This application requires a whole living cell that can be cultured readily in the laboratory including any of those selected above.

The robust cell line can be established from a skin biopsy of the organisms required or can be purchased from cell repositories, so long as these repositories do not prohibit modifying the cell line to be more useful and then distributing products derived from it. Immortalized cell lines can only be used for this purpose. However, if the means to immortalize them results in unstable chromosome complements that can change the relative ratios of introduced recombinant sequences to the total genomic DNA, then each batch of prepared controls must be verified for all the controls grown in the cell line. This comparison needs to be compared with an aliquot of DNA isolated from an earlier culture that was judged to be suitable for the application as well as any other suitable normal DNA controls. At the same time, some permanent cell lines have stable chromosome complements like the 46,XX and 46,XY cell lines (GM and GM) used for chromosome sorting. This selection merely required monitoring the suitability of the permanent cell line which can have many frozen aliquots of cells found to be suitable that can be thawed quickly whenever the current cell line might evolve to become unusuable. The number of target sites compared to the number and identity of the chromosomes in the cell line can be monitored readily by fluorescence in situ hybridization using the vector and inserts subsequently placed into the selected cell line.

The next step is to prepare the multiple mutant controls spliced together into a minimal number of DNA fragments. These fragments are then each introduced into a selectable vector along with a short unique gene sequence in the chosen target cell line. Now each vector is introduced into the chosen target cells, one vector per target cell culture, and selected for the retention of the gene on the vector. Selectable gene markers typically result in only the cells with the expressed gene growing well in the culture medium. Cells are then selected from each target culture to find recombinant daughter cells that retain the delivery vector and the mutant gene sequences. Fluorescence in situ hybridization is one means to determine the number of copies of vector are incorporated into each cell line and the individual target chromosome location(s). Typically a cell line carrying one vector will be the construct of choice for mutation control production. If 25 mutations are cloned into 8 fragments, then a target cell line can be used with about one-eighth as much DNA as the tested species to be controlled. Then eight cell lines can be grown, the DNA quantified, and equal quantities of DNA isolated from each cell line can be mixed equally, and provided as a control to laboratories requesting this material. Alternatively, cells from organisms with even less DNA can be mixed with other DNA from the same organism that does not have mutant control sequences in order to provide total genomic DNA with the correct number of mutant DNA sequences for comparison to the total DNA in each cell of the tested species.

D. Artificial centromeres have been constructed from cloned arrays of the 171 bp common centromeric repeat in human chromosome centromeres. The DNA in these artificial centromeres is duplicated along with the DNA in the other chromosomes. During the subsequent mitotic cell division the centromeric repeats segregate to each daughter cell with a high degree of reliability for many cell generations (Willard et al). Therefore these structures have been proposed as a means to introduce whole normal genes into cells lacking active genes in order to cure human genetic disease. This communication teaches that the same protocol can be used to introduce synthesized mutant gene fragments into previously constructed artificial chromosomes. When selectable gene markers are simultaneously introduced, the recombinant centromere can then be selected so that the cells with the centromere outgrow the other cells and take over the culture.

Because artificial centromeres segregate with other chromosomes, larger amounts of genomic sequences can be added with an ever increasing number of selectable genes. In this fashion, two or more mutant DNA sequences could be inserted. Furthermore, the length of these sequences would be much greater. For instance, if one wanted to add the entire coding sequence of a long gene like dystrophin (14 kb), this should be possible without exceeding the carrying capacity of the artificial centromere. Therefore many control sequences for many different genes could be incorporated into the same artificial centromere. Then fewer and fewer cell lines will need to be constructed and marketed in order to control for the ever expanding repertoire of reported disease gene loci and their common mutations. This can be seen to be quite useful for controls required for microchips that can test hundreds to thousands of individual DNA sequences simultaneously.

Alternative Embodiments

As described in U.S. application Ser. No. 10/236,168, controls can be synthesized by PCAR as well as by site-directed mutagenesis. Other embodiments can be used to synthesize the multiple mutation controls including synthesis of DNA fragments that themselves act as controls or test primers, or act as PCR primers that can be annealed and extended by DNA polymerase to obtain ever longer DNA fragments for subsequent use or manipulation according to the protocols taught by the original filed document and expounded upon by this filing.

At the same time include synthesizing controls sufficiently long to act as controls for restriction enzyme analysis. This may simply represent specific single or multiple nucleotide changes abolishing or creating a restriction enzyme site, or modifying PCR primers so that a restriction enzyme site is created in only one of amplified allelic targets at a specific site. At the same time, this might be used to prepare sequences that are typically longer than PCR amplified fragments for use as controls for restriction enzyme analysis with or without Southern blotting. Southern blot analysis is often replaced by PCR when testing routine RFLP analysis. However, restriction enzyme analysis may be required to detect (1) very long trinucleotide repeat expansions or (2) methylated sequences reflecting inactivation of genes by imprinting, X inactivation, or other gene-specific modification of gene expression for which restriction enzyme digestion with or without Southern blotting is required.

CONCLUSIONS, RAMIFICATIONS, AND SCOPE

U.S. application Ser. No. 10/236,168 (which is incorporated entirely herein, including definitions, by this reference) taught that every one of the 100 selected cystic fibrosis mutations could be synthesized readily, even when patient material was unavailable and only the published sequence was available. This even included a 10 kb deletion that was synthesized by judicious selection of primers that independently amplified the upstream and downstream sequences independently before splicing the two together, even though the original amplified sequences were 10,000 basepairs apart. As taught herein these amplified sequences that may be cloned can be diluted to reflect samples like the unknown genomic sequences to be tested by the DNA assay. Dilution not only can be completed with any RNA or DNA that is sufficiently different at all loci so as not to interfere with the assay, but the sequences can also be introduced into living cells so that these are propagated simultaneously with the other genomic DNA in the proportions of the gene loci and total DNA being tested by the assay. Furthermore, these multiple controls can not only be introduced into chromosomes found in any of the plant or animal species whose cells can be cultured in the laboratory, but artificial chromosomes for selected often tested species can be synthesized and used to carry many fragments of control gene sequences

This patent application also teaches how to distinguish multiple controls that are added to the assay at the same time or a different time as the total unknown DNA to be tested in the same reaction mixture. These can be distinguished by labeling the DNA with different reporter molecules with readily detectable characteristics that do not overlap with the characteristics of the unknown reporter labeling the sequences tested from the unknown DNA sample. This can be major differences in fluorescence like infrared labeled control sequences and fluorescein labeled unknown sequences. If includes described and yet to be described reporter molecules like those that phosphoresce and those that use dyes within the visible and invisible spectrum of wavelengths.

Finally, this patent application teaches how to prepare viruses and produce RNA noninfectious products for comparison to RNA viruses as well as to RNA gene products.

Taken together, the ramifications of multiple controls is to prepare and use these reagents wherever they are needed with the available technology in order to improve quality control of manufactured laboratory test formats, laboratory prepared reagents (home brew), and incomplete or complete manufactured test kits. These improvements will impact quality assurance programs which can use these reagents to test laboratory proficienty as well as to monitor the laboratories' self monitoring and reliability during inspections and according to other requested materials. Best of all, the consumer will be assured of the continuing optimization of all reported laboratory tests of RNA and DNA so that the decisions made based upon the results can be made with a greater degree of comfort that the data upon which these are based is being optimized as thoroughly as possible.

The scope of this patent may include all organisms that use DNA and/or RNA as their genetic code on this planet including viruses, bacteria, plants, and animals. Optimal controls for polymorphic length will result in highly reliable DNA “fingerprints” at multiple polymorphic loci so that criminals who are arrested for an unrelated crime can be connected to other crime scenes or victims even though the incident was months or years earlier. Optimal trinucleotide repeat controls can be used to readily distinguish normal, gray zone, and affected categories for genetic diseases arising from these mutation categories. Using highly reliable polymorphic controls to measure small nucleotide repeats (See Lebo et al, 2001), the identity and relationship of living persons in subsequent generations can be compared to records of DNA results obtained in this generation with numerous applications.

DEFINITIONS

Definitions are incorporated by reference from all the inventors patents (Lebo and Ravetch, 1998, 1999; Lebo, 1997a; 1997b, Lebo et al., 2002a; 2002b)

The present invention pertains to the art of genetic testing and specifically to methods and apparatuses for constructing and using genetic test controls for optimizing the quality control of tests for the detection of genetic diseases, genetic anomalies, resulting from deletions, translocations, and polymorphisms in the genetic code.

There are a variety of methods presently being practiced for the identification and detection of anomalies and errors in the genetic code of an individual that portend adverse or negative phenotypic outcomes. These tests are often used in advising individuals on the prospects that their children may be carriers of a genetic disease causing genotype or will be affected with the disease. In other circumstances, these tests may be employed to determine whether a fetus is likely to be a carrier of an abnormal genotype or afflicted with a genetic disease resulting from an abnormal genotype. As scientific understanding of the genetic code increases and we are better able to identify which portions of the genetic code, singularly or in combination, are responsible for the phenotypic traits of the associated person (or animal, plant, or insect), including both disease traits as well as normal, variable physical traits and predispositions found within the population, it is anticipated that genetic testing for the underlying genetic markers of these traits will be utilized more and more and that, accordingly, more and more important decisions concerning such things as childbearing (whether to have children and whether to carry fetuses to full term), treatment regimens, and even strategies for the disease eradication or eradication of societally deemed less desirable traits, will be made on the basis of these test results.

Given the nature and gravity of the decisions that are being made on the basis of genetic test results, it is increasingly important that the tests be properly controlled to improve the quality and accuracy of the test results. An error in the test may return a false positive or negative result, which could affect a decision and course of conduct adversely. While controls are routinely incorporated into genetic tests, there is an ongoing need for better, more complete controls and for improved testing methods and procedures that provide improved quality control over the results.

Many genetic tests, which incorporate processes such as fluorescence in situ hybridization, PCR and reverse transcription PCR, utilize labeled or otherwise identifiable, pre-selected nucleotide sequences, which may be labeled probes or primers, to detect abnormalities or anomalies in the nucleotide sequence at the gene locus or loci to be tested. These probes or primers may include either a “normal” nucleotide sequence (i.e., the nucleotide sequence or portion thereof, found in individuals who are not carriers of a defect in the gene locus of interest) or one or more “abnormal” nucleotide sequences (i.e., a nucleotide sequence representing all or a portion of a known mutation at the gene locus). These probes or primers are mixed with an unknown sample of DNA (unknown in the sense that the presence of mutations at pre-selected gene loci of interest has not been established) and detection of the presence of normal or abnormal sequences at the gene locus is accomplished by detecting whether and which of the probes or primers adequately hybridize to the sequence of the unknown DNA sample.

To determine whether the probes or primers are active and working properly, it is important to include, in the genetic test, controls that are directed toward proving the efficacy of the probes or primers. The inclusion of controls in genetic testing is well known in the art.

At issue in the present invention is the fact that for many genes, genes which have mutations that can give rise to phenotypically expressed disease or render the individual a carrier, detection of incorporated mutations, when using probes or primers as described above may be masked and test results may be misleading as a result of the presence of homologous gene nucleotide sequences located elsewhere in the sample. Homologous sequences may include psuedogenes. Pseudogenes are sequences of genomic DNA with such similarity to normal genes that they are regarded as non-functional copies or close relatives of genes. The presence of psuedogenes or other homologous sequences in the unknown test sample may “fool” the probes or primers into hybridizing with the homologous sequence instead of, or, more likely, in addition to the genetic sequence at the gene locus of interest. Of particular concern is that a false negative reading for a homozygous mutation at a gene locus could result if a normal sequence primer hybridizes to a homologous region of the sample and appears in the test as an indication that the normal sequence is present at the gene locus. It is also possible that a probe or primer for detecting a mutant sequence could hybridize a sequence on a homologous region, which would indicate the presence of the mutation, but falsely indicate that the mutation is at the gene locus of interest. To avoid this result, probes or primers should be selected in such a manner that they will suitably distinguish between the sequence of the gene locus and known homologous sequences thereto. By selecting primers or probes that are able to distinguish between homologous sequences and the normal or mutant sequences at the gene locus, false positive and false negative readings as described above may be avoided.

Selecting appropriate probes and primers that can distinguish homologous nucleotide sequences from normal or mutant sequences being tested at the gene locus of interest is on aspect of the invention. Another aspect of the invention involves providing suitable controls for better ensuring that the selected probes or primers are, in fact, distinguishing homologous sequences. A variety of different controls for this purpose are taught herein.

In one embodiment, the control may be total genomic DNA having the target gene locus removed. By removing the target gene locus from the control, the selected primer or probe will not be able to hybridize with any portion of the nucleotide sequence of the gene locus. If a labeled primer or probe hybridizes with any region of the control, which would be demonstrated by the test, it could be assumed that the selected primer or control is not adequately distinguishing normal or mutant sequences at the target gene locus from a homologous sequence elsewhere and the test results may be masking the true genotype at the target gene locus as a result of interactions with the homologous sequence.

In an alternate embodiment, the control may be total genomic DNA having the chromosome containing the target gene locus removed.

In yet another embodiment, the control may be a nucleotide sequence, which includes the nucleotide sequences of the known homologous sequences to the target gene locus.

In still another embodiment, it may be possible to combine controls for different aspects of the genetic test so as to reduce the complexity of the test and the materials used in the test. Thus, for example, a control may include a mixture of nucleotide sequences representing the controls for homologous regions to the target gene locus and the mutations of interest. A control mixture of this nature would allow for confirmation, in a single panel, of the efficacy of the mutant primer or probe and the ability of the normal primer or probe to adequately distinguish the normal sequence at the target gene locus from homologous sequences contained in the control mixture.

In an alternative embodiment, the control may be a single nucleotide sequence having two portions, a first portion consisting of one or more homologous sequences and a second portion consisting of one or more mutant nucleotide sequences.

Also disclosed is a method of genetic testing wherein the method includes testing for the presence of a normal nucleotide sequence at a target gene locus, testing for the presence of one or more mutant nucleotide sequences at the target gene locus and controlling for interference by homologous nucleotide sequences by providing a control as described above.

What is taught herein includes, a method of optimizing quality control in a genetic test assay, which may include a hybridization based assay, wherein a nucleotide probe or primer us used, including testing for the presence of a normal gene nucleotide sequence portion at a pre-selected gene locus; testing for the presence of at least a first mutant gene nucleotide sequence portion at the pre-selected gene locus; and testing for interference by at least a first homologous nucleotide sequence portion. This method may include testing for interference by homologous nucleotide sequence portion using a homologous nucleotide sequence control, wherein the homologous nucleotide sequence control may include at least a first homologous nucleotide sequence portion which is homologous to the normal gene nucleotide sequence of the pre-selected gene locus, and is devoid of a hybridizing nucleotide sequence portion of the pre-selected gene locus, wherein the hybridizing nucleotide sequence portion is sufficiently large to prevent detection of the pre-selected gene locus by the assay.

According to the above, the hybridizing nucleotide sequence portion may include the normal gene nucleotide sequence portion. Alternatively, the hybridizing nucleotide sequence portion may be substantially the entire gene nucleotide sequence of the pre-selected gene locus.

A control, adapted for use in an assay, wherein the assay is for the detection of mutations in the gene nucleotide sequence at the at least a first pre-selected gene locus, is also taught and may include at least a first nucleotide sequence, wherein the at least a first nucleotide sequence includes at least a first homologous nucleotide sequence portion and wherein the at least a first nucleotide sequence lacks a sufficiently large segment of the gene nucleotide sequence at an at least a first pre-selected gene locus to preclude detection of the gene nucleotide sequence at the at least a first pre-selected gene locus by an assay.

In one embodiment of the control just described, the at least a first nucleotide sequence may be total genomic DNA having a sufficiently large segment of the gene nucleotide sequence at the at least a first pre-selected gene locus removed.

In another embodiment of the control, the control may further include at least a second nucleotide sequence containing a sufficiently large segment of an at least a first pre-selected mutant gene nucleotide sequence so as to be detectable by the assay.

In yet another embodiment of the control, the at least a second nucleotide sequence may contain sufficiently large segments of at least second and third pre-selected mutant gene nucleotide sequences so as to be detectable by the assay.

In yet another embodiment of the control, the number of copies of the at least a first nucleotide sequence is approximately equal to the number of copies of the at least a second nucleotide sequence.

Also taught is a control for an assay which may include, a first nucleotide sequence portion, wherein the first nucleotide sequence portion may include at least a first homologous nucleotide sequence portion having a sufficient length to adequately imitate corresponding normal or mutant nucleotide sequence portions found at a pre-selected gene locus being tested in an assay and, which may contain at least a first distinct nucleotide species, wherein the at least a first distinct nucleotide species may not be found in either the normal or mutant nucleotide sequence portions and wherein the at least a first distinct nucleotide species is suitably distinct as to provide a means for confirmation that the first homologous nucleotide sequence portion is not being detected in the assay by a normal or a mutant sequence primer used in the assay; and also including a second nucleotide sequence portion, wherein the second nucleotide sequence portion includes a sufficiently large segment of a first mutant gene nucleotide sequence portion found at the pre-selected gene locus so as to be detectable by the assay.

In another embodiment of the control just described, the control may include, at least a second homologous nucleotide sequence portion substantially adjacent the first nucleotide sequence portion, wherein the at least a second homologous nucleotide sequence portion has a sufficient length to adequately imitate corresponding normal or mutant nucleotide sequence portions of the pre-selected gene locus being tested in the assay and containing at least a first distinct nucleotide species, wherein the at least a first distinct nucleotide species is not found in either the normal or mutant nucleotide sequence portions and wherein the at least a first distinct nucleotide species is suitably distinct as to provide a means for confirmation that the at least a second homologous nucleotide sequence portion is not being detected in the assay by a normal or a mutant sequence primer used in the assay

In yet another embodiment of the control just described, the control may include, a second nucleotide sequence portion having a sufficiently large segment of an at least a second mutant gene nucleotide sequence portion found at the pre-selected gene locus so as to be detectable by the assay.

Artificially synthesizing 29 homozygous cystic fibrosis core panel controls demonstrates placing multiple homozygous mutant sequences on the same single control DNA sequence to streamline quality control by minimizing extra control assays, time, and costly formatted test materials and testing all controls during every test. Any rare or unavailable reported DNA sequence can be PCR amplified using primer pairs synthesized with the designated mutation or variant sequence with paired adjacent upstream and downstream primers to amplify target sequences in total genomic DNA. Then the adjacent upstream and downstream PCR fragments are amplified together to obtain homozygous mutant controls. One to four homozygous mutations were artificially synthesized into each of 17 fragments 433 bp-933 bp long by PCR amplification on total normal human genomic DNA template and then cloned into 9 vectors with up to 3 fragments cloned into each multiple cloning site. Together these mutations included (1) all 25 mutations recommended by ACMG for the core cystic fibrosis panel including four cystic fibrosis mutations previously unavailable through any source (2184delA, 1078delT, 1898+1G->A, and 1148T) and (2) four mutations on a single exon 11 fragment (G542X, G551D, R560T, and 1717-1G->A) and three mutations from a single exon 4 fragment (I148T, R117H, and 621+1 G->T) cloned into the same multiple cloning site. When diluted with total genomic yeast RNA, these cloned sequences provide homozygous controls for a multiplex 25 mutation cystic fibrosis PCR amplification test recommended by the American College of Medical Genetics. All 29 cloned homozygous mutations have been verified by sequencing, the gold standard to analyze cloned DNA fragments. Cloned mutant sequences diluted with yeast RNA to the same concentration as the target gene sequence provide controls for all tested sequences throughout each PCR assay to maintain a highly reliable multiplex test.

Achieving a reliability of 99% for any 25 mutation test requires an individual mutation test reliability of 99.96% (1 incorrect out of 2500 reported results). In contrast, achieving a test reliability of 99% for a 100 mutation test requires an individual mutation test reliability of 99.99% (1 incorrect out of 10,000 reported results). When applying ever more complex tests, simultaneously testing controls for all target sequences is required to maintain the most reliable molecular genetic analyses. Thus the College of American Pathologists' Molecular Genetics Committee dictates that whenever possible a control needs to be included in an independent control reaction for each molecular test. At the same time, obtaining controls for each reported gene and its common mutations is typically the most difficult part of introducing any new robust molecular assay into a molecular laboratory's test menu.

The cystic fibrosis transmembrane receptor (CFTR) gene is mutated in both alleles in patients with cystic fibrosis (CF) and congenital bilateral absence of the vas deferens (CBAVD). Cystic fibrosis is the most common lethal genetic disease in Caucasians, affecting about 1 in 2500 newborns while CBAVD affects about 1 in 5,000 males. With improvements in therapy, about 2/3 of cystic fibrosis patients remain alive by the age of 16, but this remains a progressive, lethal genetic disease. Over 1000 mutations and 50 polymorphisms have been reported worldwide throughout a large portion of this 24 exon CFTR gene with its 4600 basepair coding sequence that spans 190 kb of genomic DNA (Cystic Fibrosis Mutation Database.)

The ACMG 25 mutation cystic fibrosis panel was selected by the Cystic Fibrosis Committee of the American College of Medical Genetics for screening pregnant Caucasian women who are at a more substantial risk of having a cystic fibrosis fetus than women from other races (Grody et al., 2001). These 25 CFTR mutations were selected to include mutations with a frequency of at least 0.1% in cystic fibrosis patients. This is considered an attainable goal for a large number of laboratories to offer routinely and at the same time charge a reasonable fee so that third party payers would be able to reimburse patients for laboratory services. In contrast, the testing approaches with more limited sets of tested mutations previously provided by >90% of laboratories was deemed insufficient to detect an appropriate proportion of the >1000 mutations found to result in the cystic fibrosis genotype. Currently well over 100 laboratories in Europe and North America screen pregnant Caucasian women for these 25 CFTR mutations (Genetests; Barker).

Although initially developed for population carrier screening, the 25 mutation core panel appropriately screens patients with suspicious symptoms because over 90% of all affected Caucasian patients will test positive for at least one mutation including over 99% of affected Northern European Caucasian patients (Lebo and Grody, Submitted). While designated only as a reflex test in carrier screening, incorporating the 5T allele into the select group of 25 most common mutations tested in symptomatic patients is appropriate because the 5T allele with decreased penetrance along with the severe ΔF508 mutation are the two most common alleles resulting in cystic fibrosis-like symptoms including CBAVD (Lebo and Grody, Submitted). Even though the ΔF508/5T genotype has a lower penetrance, symptomatic patients should be tested for 5T because 0.17% to 3.4% of cystic fibrosis patients have one allele with the 5T sequence without any other detected mutation. Although some laboratories are not screening CBAVD patients for the 5T sequence, this needs to be tested because about 40% of CBAVD patients are compound heterozygotes for 5T (Lebo and Grody, Submitted).

When the 25 mutation panel was selected by the ACMG Cystic Fibrosis Committee, the controls for each of these mutations were unavailable in any one location. Thus many laboratories expanded their CFTR mutation testing panel to 25 mutations and began offering the service without simultaneously testing all the appropriate controls. In order to meet the substantial new demand for better controls, Coriell Cell Repository, Camden, N.J. aggressively collected, transformed, and distributed human cell lines that contained all available mutations among those to be tested. In spite of substantial efforts to date, Coriell was unable to obtain a collection of cell lines that together contain all 25 mutations. Coriell has been marketing total genomic DNA from cell lines transformed with Epstein Barr virus as a product of Substantial Equivalence. Genzyme, which began testing about 87 mutations a decade ago, had to complete many thousands of tests over several years to identify patient DNA controls for all 87 selected mutation tests. Testing all 21 cystic fibrosis mutations in the Coriell collection with each unknown set of patient samples requires a substantial investment in labor and test materials. For this reason dozens of laboratories doing 25 mutation tests are rotating selected Coriell's substantial equivalence control patient DNAs through their regular protocol of testing unknown samples so that all available mutation controls are tested in any series of multiplex assays but never tested together in a single multiplex assay. At the same time, most of the Coriell Cell lines are heterozygous at the CFTR mutation site. Therefore these controls cannot confirm that each assay would distinguish between a homozygous and heterozygous DNA sample.

The intended use of our construct is to mix all mutations that can be tested unambiguously in the same multiplex PCR reaction and use the minimum number of mixtures in control assays to verify analysis of each abnormal site by multiplex analysis. Alternatively, mixing all homozygous mutations together may result in some locations with normal sequences in one cloned vector and mutant sequences in another in order to synthesize homozygous controls that are each robust in every assay tested. Prior to offering a 25 cystic fibrosis mutation test at Akron Children's Hospital, our laboratory artificially synthesized and verified homozygous controls for each mutation tested on the test format.

The design of the experiment was to synthesize all selected reported mutations or polymorphism no matter how rare or unavailable using PCR primer pairs with mutant sequences, genomic DNA template (Ho et al., 1989), and editing DNA polymerase. This protocol was used to synthesize fragments that are not only homozygous for the desired mutation (FIG. 1A,B,C; Ho et al., 1989), but also heterozygous, with the mutant and normal sequence on different fragments (FIG. 1C). Multiple fragments were then joined by PCR and inserted into a multiple cloning site of the pCR-BluntII-TOPO cloning vector (See methods). Fragments with 29 mutations were cloned into 9 independent clones, grown, mixed together with yeast RNA, and tested simultaneously with DNAs from patients with unknown CFTR alleles. Together this control set has served in three multiplex PCR reactions and two test strips with the Innogenetics multiplex cystic fibrosis test kit to control for each of 29 common mutation sites tested. This strategy decreases the energy and materials required to assay greater than a dozen control reactions with permanent Coriell cell line DNAs to three PCR amplification reactions and two ASR detection strips.

Materials and Methods

Primers were selected from the CFTR gene sequence and published mutations from sequences derived from The Genome Database (See Table 1, below) and synthesized by Invitrogen. Pfu DNA Polymerase or Pfu Ultra High-Fidelity DNA Polymerase were selected because these enzymes have been reported to amplify the target DNA sequences with the highest fidelity among all thermostable DNA polymerases (97.4% of 1 kb sequences identical after 20 PCR amplification cycles for Pfu: Source: Stratagene Instruction Manual, La Jolla, Calif.). These enzymes amplified all CFTR target sequences with excellent fidelity (See below). PCR Grade dNTP Mix was purchased from Invitrogen (Carlsbad, Calif.). A typical reaction mix included 4U Pfu DNA Polymerase, 10 μl of 10×Pfu polymerase buffer, 0.2 mM each of dNTPs, 40-80 ng each of forward and reverse primers, 4 mM MgCl₂, and 25 μg total genomic DNA or 1 μl PCR amplified reaction mix(es) as template in a final volume of 100 μl.

PCR amplification proceeded by denaturing at 94° C. for 75-90 sec the first time and 30-45 sec for each subsequent cycle, annealing typically at 60° C. (Range: 52° C.-72° C.) for 30 sec, elongating typically at 72° C. (Range: 68° C.-72° C.) for 120-240 sec, and repeating this amplification 34 times. (Note: Sufficient elongation times allowed for optimal Pfu sequence editing.) The final elongation step is followed by a further elongation step at 72° C. for 300-600 sec to complete synthesizing all double stranded fragments, and then the mixture is stored at 0° C. prior to electrophoretic analysis of a PCR amplified aliquot.

PCR amplified product that reflected a single fragment of the correct length was cloned directly or joined to other PCR amplified fragments by PCR. The area to which the correct length fragment migrated was cut out of the gel, minced, and purified by spin column chromatography (Zymo Gel Purification Kit). The product was again analyzed by electrophoresis to obtain a concentration and confirm the correct length fragment had been isolated.

The unique PCR amplified fragments (11) were inserted into the pCR-BluntII-TOPO vector (1 μl; Zero Blunt Topo PCR Cloning Kit, Version K from Invitrogen, Carlsbad, Calif.) according to the manufacturer's recommended protocol. After shaking horizontally at 200 rpm at 37° C. for 1 hr, variable volumes are distributed on Kanamycin ((0.1 mg/ml) LB plates and incubated overnight at 37° C. White colonies that grew overnight were selected, amplified in 5 ml L-broth with Kanamycin overnight at 200 rpm. Quiagen Spin Miniprep Kit (cat no. 27104) was used according to manufacturer's instructions to purify plasmid DNA from overnight cultures. The insert was excised with restriction enzyme and electrophoresed to determine that the insert is the right size. Then the mutation(s) were validated by sequencing (the Gold Standard for Molecular Biology) and further tested using a commercial cystic fibrosis test kit (Innogenetics).

In this fashion, the first 33 mutations tested on the Innogenetics test strips were synthesized into fragments using 4 primers (2 mutant and 2 flanking, FIG. 2) for each to incorporate one or two mutations per synthesis into 12 target exons. Only two mutant primers and one flanking primer pair failed to produce a unique length product and had to be reselected, synthesized, and PCR amplified. The other selected and synthesized primers successfully primed the desired mutant sequence for amplification without modifying the primer sequence.

Results

PCR Amplification with Synthesized Mutant Primers

Our goal was to synthesize and clone homozygous controls for all 25 ACMG cystic fibrosis mutations and the 5T mutation for use by every laboratory using any commercial or laboratory-specific multiplex test that was developed. Thus all previously reported PCR primer sites were marked on the sequence of the CFTR gene for each desired exonic and intronic sequence tested. Then new primers sites were selected by sequence inspection that (1) spanned all published primer sites, (2) did not contain more than three identical nucleotides in tandem, (3) whenever possible were 45-60% GC basepairs, (4) had a G or a C on the 3′ end in order to form a GC clamp at the site where DNA polymerase initiates synthesis, and (5) were at least 23 basepairs long to further assure unique priming.

These PCR primers were used to amplify 29 mutations for controls on the Innogenetics test strips including the 5T mutation. The standard method of mutant sequence synthesis is to order and select primer pairs and set up PCR amplification reactions as illustrated (FIG. 2). Total genomic DNA from a normal patient served as a template. In the illustration, the adjacent G542X/G551D mutations were artificially synthesized by selecting primers from this site and changing the two nucleotide sequences that had to be modified (Table 1, Exon 11). The reverse R11 primer (Labeled R2 in FIG. 2) and forward FG542X/G551D primer (F1 in FIG. 2) together amplified the left Product #1 (FIG. 2, Section II). The forward F11 (Labeled F2 in FIG. 2) and reverse RG551D/G542X (R1 in Section II) primer pair amplified the left fragment to synthesize Product #2 (FIG. 2, Section III). Then the flanking forward and reverse primers (Labeled F2 and R2) were added to 1 μl aliquots of Products #1 and #2 and PCR amplified to give a single fragment flanked by the F2 and R2 sequences with the two mutations RG551D and G542X in the middle (product #3, FIG. 2, Section IV). This is how the first mutant primer sequence was added to each of the 17 subsequently cloned fragments (FIG. 1).

For fragments with additional mutant sites that needed to be introduced, another primer pair was synthesized for each site such as 1717-1 [Table 1, Exon 11; Labeled F3 and R3 in FIG. 2, CFTR Intron 10/Exon 11 (cont.)]. Then the mutation was incorporated into Product #4 (FIG. 2, Section V) with other upstream sequences and product #5 (FIG. 2, Section VI) with the mutation plus downstream sequences. Again a 1 μl aliquot from each mix containing amplified PCR Product #4 and product #5 and mixed together with upstream and downstream primers (F2 and R2) followed by PCR amplification to give Product #6 (FIG. 2, Section VII). Product #6 not only has the original G542X/G551D mutations but also now has the 1717-1 mutation.

Forward and reverse R560T primers were then synthesized and used to introduce this mutant sequence into Product #6 by PCR amplification with the downstream reverse primer R2 and the upstream Product #8 with the upstream forward primer F2 (FIG. 2, Sections IX and X). Now products #7 and #8 are ligated together and amplified with primers F2 and R2 to produce Product #9 (FIG. 2) with four mutations: the new mutation R560T along with the original three mutations 1717-1, G542X, and G551D.

The number of useful mutations that can be added to a single sequence is determined by the minimal distance between mutations required to distinguish unambiguously between normal and mutant sequences at each site tested by the assay used that requires a multiplex control, i.e. (1) nitrocellulose filters with slot blot designated locations or microchips with unique locations to which ASOs are hybridized uniquely, or (2) Mass Spec that hybridizes complementary nucleotide sequences adjacent to the mutations to be measured and then adds additional nucleotides until a base complementary to the single base subtracted from the reaction mixture is encountered. For instance, with our controls tested on commercial nitrocellulose slot blot formats, the controls are homozygous to assure that each control unambiguously gives a homozygous mutant signal because it does not cross hybridize with the normal sequence under the assay conditions selected. Therefore these were added to two different 434 bp Intron10/Exon11 fragments, one in the top clone with two fragments (FIG. 1, first insert, right fragment) and the other in the third clone with two fragments (FIG. 1, 3^(rd) insert, right fragment). Another site where separate fragments had to be used were at the adjacent three basepair deletions of ΔF508 and ΔI507 (FIG. 1, insert 3, left segment, and insert 4, left segment). Therefore, at least two mixtures of all 25 homozygous mutations are required to test all homozygous mutations at each assay site.

Three mutations were added to the Exon/Intron4 fragment (R117H, 1148T, and 621+1) by using an analogous protocol with flanking PCR fragments and mutation-specific primer sequences (FIG. 2, Sections XII-XV illustrates insertion of the 621+1 mutant sequence). Now two fragments have been synthesized with 4 mutations in the first and 3 mutations in the second. These can be spliced together by synthesizing a fragment that incorporates a complementary sequence to the illustrated R2′ site of product #6 (FIG. 2, Section XVI) upstream of the F2 sequence used to amplify product #9. Then PCR amplification with this new primer R2′/F2 and R2 of Product #9 (FIG. 2, Section XVI) amplifies a new Product #9′ homologous to the 5′ sequence that overlaps with the 3′ end of Product #6. Now 1 μl of product #6 and 1 μl of Product #9′ are mixed and PCR amplified together with primers F2′ and R2 to product Product #10 (FIG. 2, Section XVII). Because these restriction enzyme sites were engineered into the ends of the Product #10, this could be cloned into the PstI and XbaI location in a multiple cloning site. Even though we used this approach for the first construct, we found Invitrogen's pCR-BluntII-TOPO vector blunt end cloning kit worked more easily. Thus all future clones were inserted following PCR into blunt ended cloning sites. Even the original construct was remade and the blunt ended PCR product cloned directly into the vector. A simplified drawing of each of these 9 clones is shown in FIG. 1.

Each of the 17 synthesized fragments (FIG. 1) were tested at each stage of synthesis using Innogenetics test strips to verify that the mutation gave a unique signal prior to moving ahead to the next step (except the ΔF508 mutation which was cloned after PCR amplifying the site in deidentified DNA from a patient with this mutation (See Below). Subsequently all 17 fragments were cloned into 9 clones.

Each of these 9 clones was then streaked, grown overnight on L-broth plus Kanamycin, and individual white clones selected and grown in 5 ml aliquots of suspension L-broth culture with Kanamycin. Two 1 ml aliquots were then frozen at −80° C. and plasmid DNA was prepared from the 3 ml of remaining culture using Quiagen mini prep kits. Subsequently the 9 clones in the first panel of homozygous controls were sequenced from aliquots of minipreped DNA by bi-directional sequencing. All the mutant sequences (except F508) which gave the correct results on Innogenetics test strips were confirmed by bi-directional DNA sequencing using M13 primers flanking the vector's multiple cloning site and internal PCR primers. The quantity of sequenced clone fragments isolated from the minipreps were diluted to 100,000-copies in 20 ng yeast carrier in a volume of 2 microliters provided sufficient controls for validation of many cystic fibrosis test kit platforms.

Subsequently the 9 clones in the first panel of homozygous controls were sequenced by bidirectional sequencing with the results overlapping with at least one readable sequence in each direction. The sequences support two principles: (1) All PCR amplified products from cell lines or artificially synthesized DNA fragments must be sequenced to verify that a reliable control has been synthesized. 2) Because a great majority of genes have homologous active or pseudo gene sequences elsewhere in the genome, primers must be selected very carefully for both the assay and the control to avoid obtaining heterozygous test results when analyzing total genomic DNA from a patient with a homozygous mutation. Below is a summary of the sequencing and analysis of the first 9 constructed clones with 29 different cystic fibrosis mutations at 32 cloned sites. These can be used in two mixes to test each panel mutation with a homozygous control. One of the nine clones has an unreported 622-194G/A variation in intron 4 that is 284 basepairs removed from the 711+1G->T mutation in exon 5. Three variations found among these 6596 cloned DNA basepairs are perfect matches to the normal published database CFTR (cystic fibrosis transmembrane receptor) gene sequences.

DNA from a Patient with a Typical ΔF508 Homozygous Mutation.

Of the 29 mutations cloned, 28 of 28 synthesized using selected PCR primers on normal DNA template were perfect at the mutation site. The high fidelity of PCR amplification and mutation insertion was attributed to only using high quality synthesized PCR primers (Invitrogen) and PFU DNA Polymerase or PFU Ultra High-Fidelity DNA Polymerase reported to amplify the target DNA sequences with the highest fidelity among all thermostable DNA polymerases. A single cloned mutation derived from a patient homozygous for two ΔF508 mutations. Because one of the authors had an occasion to sequence a rare ΔF508 patient mutation deleting TTT, we also synthesized this mutation using PCR primers:

Normal: Typical ΔF508 ΔI507 Rare ΔF508 I I F I I ΔF I ΔI F I I ΔF ATC ATC TTT ATC AT- --T ATC --- TTT ATC ATC ---

Adjacent to the typical and to the rare cloned AF508 mutation-containing fragment is the Intron10/Exon11 fragment with a normal G551 site and the mutant 1717-1G->A, G542X, R553X, and R560T sites.

Another clone of note has three inserts. The center insert with the 3849+10 kbC->T mutation has otherwise normal sequences. The right insert with the mutant N1303K sequence in exon 21 has an otherwise normal exonic sequence and 5′ sequence. In contrast, beginning 56 basepairs downstream in intron 21, 11 basepair substitutions were found. Searching the sequences identified by blast and thorough analysis of the entire cloned sequence found that two 130 basepair sequences exist in tandem downstream from exon 11 into intron 11. These tandem repeats are separated by 32 basepairs. Interestingly, the tandem duplicated sequences match each other perfectly in this clone. The sequences observed remind one of the reported alpha-thalassemia mutations which represent 95% of the mutations in patients with this genetic disease.

The normal alpha globin genes look like: zeta2-zeta1-pseudoalpha-alpha2-alpha1-theta globin. After recombination between a normal alpha2 and normal alpha1 genes, one chromosome has zeta2-zeta1-pseudoalpha-alpha2-alpha1/alpha2-alpha1-theta globin and other chromosome has zeta2-zeta1-pseudoalpha-alpha2/alpha1-theta globin genes.

In considering whether this could be an unaltered sequence from a normal patient or a cloning artifact, the authors concluded that these duplicated regions are most likely a normal allelic variant found in the normal DNA. This conclusion was based upon the alpha-thalassemia mutations reported in the duplicated alpha-globin gene region as well as human, African chimp, and macaque sequences identified by blasting the genome databases.

If the first 130 bp sequence is designated “A” and the second 130 bp sequence with 11 basepair substitutions is designated “B”, then the normal allele is Exon 21-A-B. Most of the human genome database sequences including the sequence reported for exon 21 have this pattern. Three human PAC clones in the database have the following pattern: Exon 21-A. The B sequence is missing. This is consistent with the meiotic recombination between a downstream 130 bp sequence in both A and B that deleted most of the B sequence. Most interestingly, the macaque has the following sequence: Exon21-A-B′/A. This is consistent with (1) meiotic recombination between B′ and A giving three copies of the 130 basepair repeat: Exon 21-A-B′/A-B. Subsequent meiotic recombination between a normal sequence Exon 21-A-B and the sequence with three 130 bp copies would give the sequences Exon 21-A-B′/A and Exon2′-A-B′-B′. Therefore CFTR alleles with three copies of the 130 bp sequence are predicted to be found. This is similar to the fact that a normal alpha globin gene cluster has two adult alpha globin genes but about 1 in 50 people have alleles with three copies of the adult alpha globin genes.

Given that some patients have the Exon21-B-B genotype, one could make the strong argument that the cloned control sequence is actually the most robust control for the N1303K mutation that could have been constructed. This is because all test primer pairs should be able to prime a unique location when testing for the N1303K mutation and quantifying the products.

The other insert in the same clone with the A455E mutation in exon 9 synthesized using A455E PCR primers has a perfect exonic sequence and 5′ sequences. Furthermore, 7 basepair substitutions have been past the splice site in intron 9 of which 6 substitutions have been found in the cftr pseudogene on chromosome 20. This is in contrast to 10 nucleotide changes in the CFTR pseudogene on chromosome 20. Therefore two of these changes can be used to design CFTR exon specific primers: 644+91G->T and 644+142A->C. Of the 7 basepair changes mapped to the cloned fragment, the same 5 mutations match CFTR-gene-like intron 9 sequences mapped to both chromosome 15 and to chromosome 20. Additional attention may have to be given to the chromosome 15 sequence. Three explanations can be advanced to explain the cloned sequence: (1) a cloning artifact, (2) the cloned sequence in the normal donor results following meiotic recombination between the pseudogene and the active CFTR gene's intronic sequences, or (3) the changes occurred in some active CFTR allele prior to duplication of the CFTR locus on chromosome 15 and the dispersion of the CFTR intron 9 sequences to chromosome 20. In any event, 98% of the cloned intron 9 sequences are identical to the common intron sequences and the exonic and 5′ sequences are identical. One can assert that the clone should be distributed with the entire sequence made available to each user laboratory to determine how this clone will work with the assay used by that laboratory. We have submitting these clones for FDA approval.

DISCUSSION

These cloned controls when diluted with yeast mRNA mimic total genomic DNA with homozygous CFTR mutations at multiple sites that test PCR amplification in multiplex reactions. This is important because these controls diluted to the number of target CFTR copies at each attested location can be taken to the PCR set-up area without any more concern for cross contamination into other tests than for other unknown genomic DNA samples tested simultaneously (Barker & European Consortium). At the same time, this remains a primary concern for controls that have not been diluted with carrier RNA or DNA.

Multiplex PCR amplification reactions typically cannot be relied upon to amplify each existing target site so that all can be visualized after a typical 10⁶-fold PCR amplification. For instance, one of us (RVL) developed the first multiplex PCR test for Y chromosome deletions and applied it clinically. This 15 site test was assayed in 3 groups of 5 PCR target sites. When completing 105 patient assays, we found 12 patients that had deletions of one or more adjacent targets that were verified by repeat individual PCR analysis at each site that failed to amplify during the initial screen. These deletions were reportedl when multiple physically adjacent sites failed to amplify in different multiplex PCR amplification reactions. However, another 11 of the 105 samples had individual reactions that failed to amplify existing targets in two or more sites that were physically separated by amplification sites. These sites could always be amplified by repeating the individual initially failed PCR reaction with primers to the single previously failed site with varying amounts of total Genomic DNA sample: 0.1×, 1×, or 10×. On rare occasions the PCR reaction had to be repeated a second time to obtain a unique size product that could be visualized following electrophoresis, EtBr staining, and UV excitation. When controls are not sufficiently long to be amplified by the assay's PCR primers, an extra electrophoretic assay step is typically added to determine whether each of the target sites have been amplified. Furthermore, depending upon the mutation (i.e. 3849+10 Kb), failure to provide equal control target sequences fail to determine whether the normal sequence amplified typically and the abnormal sequence had not amplified sufficiently to visualize.

Our controls are sufficiently long to serve as internal simultaneous multiplex PCR amplification controls as well as mutant sequence controls that have been verified by sequencing. Sequencing is considered the gold standard of DNA sequence validation. We have learned that multiple multiplex cystic fibrosis test formats have different characteristics and that failure to detect a control mutant sequence most likely can be explained by primer site location or nonspecific hybridization. In contrast, Bajjani and Amos have prepared multiple additional cystic fibrosis controls by synthesizing 100 bp fragments. This approach allows one to add the primers to control tubes after PCR amplification. The proportion of identical synthesized sequences typically drops precipitously as the synthesized length exceeds 60 basepairs. Thus the reliability of 100 bp synthesized controls is anticipated to be less than perfect. Furthermore, the concentration of the synthetic primers can only be estimated. If one requires that the most robust controls must be sequenced, then bidirectional sequencing would need to be completed before any of the synthesized aliquots were exhausted and had to be resynthesized. Since readable sequence typically begins about 30-40 basepairs downstream from the end of the target, then 100 bp synthesized sequences would be verified in only 1 direction. Clearly, a control for each mutant site is preferable to no control and additional mutation controls prepared by this method can be used prior to developing cloned controls. However, little additional effort is required to synthesized controls from four selected primer sites and total normal genomic DNA that give a PCR product sufficient to control for PCR amplification whenever any mutant sequence is prepared.

In conclusion, we have synthesized 433 bp to 933 bp sequences containing the 25 ACMG recommended mutation panel in 17 fragments inserted into 9 vector cloning sites. The entire sequences and adjacent cloning sites have been verified. When 10,000 copies of each vector has been diluted with 10 ng of yeast mRNA, these artificial mixtures can be added to test tubes adjacent to unknown samples in the PCR set up area to control for the multiplex cystic fibrosis test kits. Individual idiosyncracies of test formats for which these may be employed will depend upon the precise PCR primer sites, whether these sites include normally variant sequences that interfere with PCR amplification, whether the primer pairs also amplify homologous genomic DNA sequences at other locations including the pseudogene sequence on chromosome 15, and the relative amplification of each site during PCR at multiple different salt conditions found in multiple extracted DNA samples. Nevertheless, this reported artificial construct with all 25 

1. A method of optimizing quality control in a genetic test assay, the method comprising the steps of: testing for the presence of a normal gene nucleotide sequence portion at a pre-selected gene locus; testing for the presence of at least a first mutant gene nucleotide sequence portion at the pre-selected gene locus; and testing for interference by at least a first homologous nucleotide sequence portion.
 2. The method of claim 1, wherein the genetic test assay is a hybridization based assay.
 3. The method of claim 1, wherein the step of testing for interference by homologous nucleotide sequence portion, involves the step of providing a homologous nucleotide sequence control.
 4. The method of claim 3, wherein the homologous nucleotide sequence control includes at least a first homologous nucleotide sequence portion which is homologous to the normal gene nucleotide sequence of the pre-selected gene locus, and is devoid of a hybridizing nucleotide sequence portion of the pre-selected gene locus, wherein the hybridizing nucleotide sequence portion is sufficiently large to prevent detection of the pre-selected gene locus by the assay.
 5. The method of claim 4, wherein the hybridizing nucleotide sequence portion is the normal gene nucleotide sequence portion.
 6. The method of claim 4, wherein the hybridizing nucleotide sequence portion is substantially the entire gene nucleotide sequence of the pre-selected gene locus.
 7. A control comprising: At least a first nucleotide sequence, wherein the at least a first nucleotide sequence includes at least a first homologous nucleotide sequence portion and wherein the at least a first nucleotide sequence lacks a sufficiently large segment of the gene nucleotide sequence at an at least a first pre-selected gene locus to preclude detection of the gene nucleotide sequence at the at least a first pre-selected gene locus by an assay, and Adapted for use in the assay, wherein the assay is for the detection of mutations in the gene nucleotide sequence at the at least a first pre-selected gene locus.
 8. The control of claim 7, wherein the at least a first nucleotide sequence is total genomic DNA having the sufficiently large segment of the gene nucleotide sequence at the at least a first pre-selected gene locus removed.
 9. The control of claim 8 further comprising: at least a second nucleotide sequence containing a sufficiently large segment of an at least a first pre-selected mutant gene nucleotide sequence so as to be detectable by the assay.
 10. The control of claim 9 wherein the at least a second nucleotide sequence contains sufficiently large segments of at least second and third pre-selected mutant gene nucleotide sequences so as to be detectable by the assay.
 11. The control of claim 10 wherein the number of copies of the at least a first nucleotide sequence is approximately equal to the number of copies of the at least a second nucleotide sequence.
 12. A control comprising: a first nucleotide sequence portion, wherein the first nucleotide sequence portion includes at least a first homologous nucleotide sequence portion having a sufficient length to adequately imitate corresponding normal or mutant nucleotide sequence portions found at a pre-selected gene locus being tested in an assay and containing at least a first distinct nucleotide species, wherein the at least a first distinct nucleotide species is not found in either the normal or mutant nucleotide sequence portions and wherein the at least a first distinct nucleotide species is suitably distinct as to provide a means for confirmation that the first homologous nucleotide sequence portion is not being detected in the assay by a normal or a mutant sequence primer used in the assay; and a second nucleotide sequence portion, wherein the second nucleotide sequence portion includes a sufficiently large segment of a first mutant gene nucleotide sequence portion found at the pre-selected gene locus so as to be detectable by the assay.
 13. The control of claim 12, further comprising: at least a second homologous nucleotide sequence portion substantially adjacent the first nucleotide sequence portion, wherein the at least a second homologous nucleotide sequence portion has a sufficient length to adequately imitate corresponding normal or mutant nucleotide sequence portions of the pre-selected gene locus being tested in the assay and containing at least a first distinct nucleotide species, wherein the at least a first distinct nucleotide species is not found in either the normal or mutant nucleotide sequence portions and wherein the at least a first distinct nucleotide species is suitably distinct as to provide a means for confirmation that the at least a second homologous nucleotide sequence portion is not being detected in the assay by a normal or a mutant sequence primer used in the assay
 14. The control of claim 13, wherein the second nucleotide sequence portion further comprises: A sufficiently large segment of an at least a second mutant gene nucleotide sequence portion found at the pre-selected gene locus so as to be detectable by the assay 